59 research outputs found

    Novel chemometric approaches towards handling biospectroscopy datasets

    Get PDF
    Chemometrics allows one to identify chemical patterns using spectrochemical information of biological materials, such as tissues and biofluids. This has fundamental importance to overcome limitations in traditional bioanalytical analysis, such as the need for laborious and extreme invasive procedures, high consumption of reagents, and expensive instrumentation. In biospectroscopy, a beam of light, usually in the infrared region, is projected onto the surface of a biological sample and, as a result, a chemical signature is generated containing the vibrational information of most of the molecules in that material. This can be performed in a single-spectra or hyperspectral imaging fashion, where a resultant spectrum is generated for each position (pixel) in the surface of a biological material segment, hence, allowing extraction of both spatial and spectrochemical information simultaneously. As an advantage, these methodologies are non-destructive, have a relatively low-cost, and require minimum sample preparation. However, in biospectroscopy, large datasets containing complex spectrochemical signatures are generated. These datasets are processed by computational tools in order to solve their signal complexity and then provide useful information that can be used for decision taking, such as the identification of clustering patterns distinguishing disease from healthy controls samples; differentiation of tumour grades; prediction of unknown samples categories; or identification of key molecular fragments (biomarkers) associated with the appearance of certain diseases, such as cancer. In this PhD thesis, new computational tools are developed in order to improve the processing of bio-spectrochemical data, providing better clinical outcomes for both spectral and hyperspectral datasets

    Establishing spectrochemical changes in the natural history of oesophageal adenocarcinoma from tissue Raman mapping analysis

    Get PDF
    Raman spectroscopy is a fast and sensitive technique able to identify molecular changes in biological specimens. Herein, we report on three cases where Raman microspectroscopy was used to distinguish normal vs. oesophageal adenocarcinoma (OAC) (case 1) and Barrett’s oesophagus vs. OAC (cases 2 and 3) in a non-destructive and highly accurate fashion. Normal and OAC tissues were discriminated using principal component analysis plus linear discriminant analysis (PCA-LDA) with 97% accuracy (94% sensitivity and 100% specificity) (case 1); Barrett’s oesophagus vs. OAC tissues were discriminated with accuracies ranging from 98 to 100% (97–100% sensitivity and 100% specificity). Spectral markers responsible for class differentiation were obtained through the difference-between-mean spectrum for each group and the PCA loadings, where C–O–C skeletal mode in β-glucose (900 cm−1), lipids (967 cm−1), phosphodioxy (1296 cm−1), deoxyribose (1456 cm−1) and collagen (1445, 1665 cm−1) were associated with normal and OAC tissue differences. Phenylalanine (1003 cm−1), proline/collagen (1066, 1445 cm−1), phospholipids (1130 cm−1), CH2 angular deformation (1295 cm−1), disaccharides (1462 cm−1) and proteins (amide I, 1672/5 cm−1) were associated with Barrett’s oesophagus and OAC tissue differences. These findings show the potential of using Raman microspectroscopy imaging for fast and accurate diagnoses of oesophageal pathologies and establishing subtle molecular changes predisposing to adenocarcinoma in a clinical setting

    Variable selection towards classification of digital images: identification of altered glucose levels in serum

    Get PDF
    classed as 125 mg/dL). Herein, we propose a method to identify control, pre-diabetic, or diabetic simulated and real-world samples based on their glucose levels using classification-based variable selection algorithms [successive projections algorithm (SPA) or genetic algorithm (GA)] coupled to linear discriminant analysis (SPA-LDA and GA-LDA) towards analyzing red–green–blue digital images. Images were recorded after glucose enzymatic reaction, whereby 250 μL of reactant content of samples were captured by using a common cell phone camera. Processing was applied to the images at a pixel level, where 72.2% of the pixels were correctly classified as control, 79.2% as pre-diabetic, and 90.9% as diabetic using SPA-LDA algorithm; and 76.8% as control, 81.4% as pre-diabetic, and 91.7% as diabetic using GA-LDA algorithm in the validation set containing nine simulated samples. Eight real-world samples were measured as an external test set, where the accuracy using GA-LDA was found to be 92%, with sensitivities ranging from 70% to 100 and specificities ranging from 90% to 99%. This method shows the potential of variable selection techniques coupled with digital image analysis towards blood glucose monitorin

    TTWD-DA: A MATLAB toolbox for discriminant analysis based on trilinear three-way data

    Get PDF
    Three-way trilinear data is increasingly used in chemical and biochemical applications. This type of data is composed of three-way structures representing two different signal responses and one sample dimension distributed among a 3D structure, such as the data represented by fluorescence excitation emission matrices (EMMs), spectral-pH responses, spectral-kinetic responses, spectral-electric potential responses, among others. Herein, we describe a new MATLAB toolbox for classification of trilinear three-way data using discriminant analysis techniques (linear discriminant analysis [LDA], quadratic discriminant analysis [QDA], and partial least squares discriminant analysis [PLS-DA]), termed “TTWD-DA”. These discrimination techniques were coupled to multivariate deconvolution techniques by means of parallel factor analysis (PARAFAC) and Tucker3 algorithm. The toolbox is based on a user-friendly graphical interface, where these algorithms can be easily applied. Also, as output, multiple figures of merit are automatically calculated, such as accuracy, sensitivity and specificity. This software is free available online

    ATR-FTIR spectroscopy for virus identification: A powerful alternative

    Get PDF
    In pandemic times, like the one we are witnessing for COVID-19, the discussion about new efficient and rapid techniques for diagnosis of diseases is more evident. In this mini-review, we present to the virological scientific community the potential of attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy as a diagnosis technique. Herein, we explain the operation of this technique, as well as its advantages over standard methods. In addition, we also present the multivariate analysis tools that can be used to extract useful information from the data towards classification purposes. Tools such as Principal Component Analysis (PCA), Successive Projections Algorithm (SPA), Genetic Algorithm (GA) and Linear and Quadratic Discriminant Analysis (LDA and QDA) are covered, including examples of published studies. Finally, the advantages and disadvantages of ATR-FTIR spectroscopy are emphasized, as well as future prospects in this field of study that is only growing. One of the main aims of this paper is to encourage the scientific community to explore the potential of this spectroscopic tool to detect changes in biological samples such as those caused by the presence of viruses

    A three-dimensional principal component analysis approach for exploratory analysis of hyperspectral data: identification of ovarian cancer samples based on Raman microspectroscopy imaging of blood plasma

    Get PDF
    Hyperspectral imaging is a powerful tool to obtain both chemical and spatial information of biological systems. However, few algorithms are capable of working with full three-dimensional images, in which reshaping or averaging procedures are often performed to reduce the data complexity. Herein, we propose a new algorithm of three-dimensional principal component analysis (3D-PCA) for exploratory analysis of complete 3D spectrochemical images obtained through Raman microspectroscopy. Blood plasma samples of ten patients (5 healthy controls, 5 diagnosed with ovarian cancer) were analysed by acquiring hyperspectral imaging in the fingerprint region (∼780–1858 cm−1). Results show that 3D-PCA can clearly differentiate both groups based on its scores plot, where higher loadings coefficients were observed in amino acids, lipids and DNA regions. 3D-PCA is a new methodology for exploratory analysis of hyperspectral imaging, providing fast information for class differentiation

    Age-Related and Gender-Related Increases in Colorectal Cancer Mortality Rates in Brazil Between 1979 and 2015: Projections for Continuing Rises in Disease

    Get PDF
    Purpose Brazil is the largest country in South America. Although a developing nation, birth rates have been decreasing in the last few decades, while its overall population is undergoing lifestyle changes and ageing significantly. Moreover, Brazil has had increasingly high mortality rates related to colorectal cancer (CRC). Herein, we investigated whether the Brazilian population is exhibiting increasing mortality rates related to colon cancer (CC) or rectal cancer (RC) in recent years. Methods We examined data from the Brazilian Federal Government from 1979 to 2015 to determine whether CRC mortality and the population ageing process may be associated. Results Our mathematical modelling suggests that mortality rates related to CC and RC events in the Brazilian population may increase by 79% and 66% in the next 24 years, respectively. This finding led us to explore the mortality rates for both diseases in the country, and we observed that the highest levels were in the south and southeast regions from the year 2000 onwards. CC events appear to decrease life expectancy among people during their second decade of life in recent years, whereas RC events induced decreases in life expectancy in those aged >30 years. Additionally, both CC and RC events seem to promote significant mortality rates in the male population aged > 60 years and living in the southern states. Conclusion Our dataset suggests that both CC and RC events may lead to a significantly increasing number of deaths in the Brazilian male population in coming years

    Uncertainty estimation and misclassification probability for classification models based on discriminant analysis and support vector machines

    Get PDF
    Uncertainty estimation provides a quantitative value of the predictive performance of a classification model based on its misclassification probability. Low misclassification probabilities are associated with a low degree of uncertainty, indicating high trustworthiness; while high misclassification probabilities are associated with a high degree of uncertainty, indicating a high susceptibility to generate incorrect classification. Herein, misclassification probability estimations based on uncertainty estimation by bootstrap were developed for classification models using discriminant analysis [linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA)] and support vector machines (SVM). Principal component analysis (PCA) was used as variable reduction technique prior classification. Four spectral datasets were tested (1 simulated and 3 real applications) for binary and ternary classifications. Models with lower misclassification probabilities were more stable when the spectra were perturbed with white Gaussian noise, indicating better robustness. Thus, misclassification probability can be used as an additional figure of merit to assess model robustness, providing a reliable metric to evaluate the predictive performance of a classifier

    Colourimetric Determination of High-Density Lipoprotein (HDL) Cholesterol using Red-Green-Blue Digital Colour Imaging

    Get PDF
    A rapid, low-cost and sensitive method for quantification of high-density lipoprotein (HDL) cholesterol based on enzymatic colorimetric reactions and digital image analysis was developed. The proposed method was adapted to a 96-microwell enzyme-linked immunosorbent assay (ELISA) plate and imaging acquisition was performed using a conventional desktop scanner. The images were recorded using the red-green-blue (RGB) colour system in which the resolved absorbance for each colour channel was used for multiple linear regression. The regression model presented a root mean squared error of calibration and R2 value of 1.53 mg dL-1 and 0.995, respectively. Prediction was obtained with a root mean square error of prediction of 2.42 mg dL-1 and R2 of 0.993; therefore, showing a good prediction response. A limit of detection of 0.43 mg dL-1 and precision better than 1.72% reinforced these results. This method was compared with a reference methodology using UV-Vis measurements at 500 nm and no statistical difference was observed at a confidence level of 95%; showing its potential for future clinical applications

    Assessment of macadamia kernel quality defects by means of near infrared spectroscopy (NIRS) and nuclear magnetic resonance (NMR)

    Get PDF
    Macadamia kernels are visually sorted based on the presence of quality defects by specialized labors. However, this process is not as accurate as non-destructive methods such as near infrared spectroscopy (NIRS) and nuclear magnetic resonance (NMR). Thus, NIRS and NMR in combination with chemometrics have become established non-destructive method for rapid assessment of quality parameters in the food and agricultural sectors. Therefore, the quality of macadamia kernel was assessed by NIRS and NMR using chemometric tools such as PCA-LDA and GA-LDA to evaluate external kernel defects. Macadamia kernels were classified as: 1 = good, marketable kernels without defects; 2 = kernels with discoloration; 3 = immature kernels; 4 = kernels affected by mold; and 5 = kernels with insect damage. Using NIRS, the GA-LDA resulted in an accuracy and specificity of 97.8% and 100%, respectively, to classify good kernels. On the other hand, PCA-LDA technique resulting in an accuracy higher than 68% and specificity of 97.2% to classify immature kernels. For NMR, PCA-LDA resulted in an accuracy higher than 83% and GA-LDA resulted in an accuracy of 100%, both to classify kernels with insect damage. NIRS and NMR spectroscopy can be successfully used to classify unshelled macadamia kernels based on the defects. However, NIRS out-performed NMR based on the higher accuracy results
    corecore